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Abstract 

The problem of determining the joint probability distributions for correlated random variables 
with pre-specified marginals is considered. When the joint distribution satisfying all the required 
conditions is not unique, the "most unbiased" choice corresponds to the distribution of maximum 
entropy. The calculation of the maximum entropy distribution requires the solution of rather 
complicated nonlinear coupled integral equations, exact solutions to which are obtained for the 
case of Gaussian marginals; otherwise, the solution can be expressed as a perturbation around the 
product of the marginals if the marginal moments exist. 

Consider the situation in which we are given two random variables, say Xi G Ii and X2 € h, which 
we know to be distributed as A(^i) and P2{X2), respectively. Further, assume we know the variables 
to be correlated; for example, assume we are given the covariance Fu = {X1X2) — {Xi){X2) 7^ 0. 
We are now required to construct the joint probability distribution P(i 2)(^i)^2) with the prescribed 
marginals. Pi (Xi) and P2(^2)> and covariance ri2. 

This and other similar problems arise in a wide variety of contexts, ranging from the descrip- 
tion of correlated financial instruments in economics EEG signals ||2l|2l in medicine, to systems 
out of equilibrium in statistical mechanics S |5l IH |71 ; to name but very few. Actually, in finance 
and other fields of intense applied statistics HI HI [lOl [HI, it has become popular to describe inter- 
dependent random variables with given marginals using "copulas". The idea there is that the "in- 
terdependence" of, say, N random variables described by cumulative marginal distributions Fi{Xj) 
is encoded in the N-dimensional cumulative distribution function with uniform marginals, the cop- 
ula: C(mi, ...,ma?) : [0, 1]^ [0, 1], with C(l, l...Uj..., 1) = uj. The complete description is achieved 
through the joint cumulative distribution F{X\, ...^X^) = C{Fi{Xi), ...,Fn{Xi\i)). This approach treats 
the individual statistics of the random variables, the marginals, separately from the interdependence 
of said variables, allowing, for example, to change the marginals keeping the interdependence, the 
copula, fixed. These tools are extremely powerful and general, but hard to estimate directly from data. 
Another caracterization of data interdependence relates to whether there is a causal relation between 
the variables, which can be tested along the hnes originally proposed by Granger in the context of 
econometrics [12J. In contrast to these more sophisticated methods, and while far from a complete 
description of the interdependence structure of much data, the correlation between two random vari- 
ables is a frequently and easily measurable quantity, extensively used in many disciphnes, including, 
of course, physics. 
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As it stands, however, the problem may be ill posed since there could be infinitely many distribu- 
tions (or none at all) satisfying the conditions of the prespecified marginals and given covariance. To 
lift the ambiguity, when it arises, we follow Jaynes 1 13 1 and require the joint distribution function to be 
that which maximizes the relative entropy or, equivalently, minimizes the discrimination information 
over the product of the marginals. This choice is, as argued by Jaynes, the "least biased" distribution 
which is consistent with the restrictions: "the maximization of entropy is [...] a method of reasoning 
which ensures that no unconscious assumptions have been introduced." |13J To this point, then, the 
problem is formally straightforward: we need to find the extreme of the entropy functional subject to 
the appropriate restrictions. That is, we require P(^i^2){^^y) such that 

P^,^^{XuX2)ln(^^^^^^^dX,dX2+Xn J X,X2Pi^,^2){W2)dX,dX2 



{a{Xi)+biX2))P^,^2)iXi,X2)dXidX2 



(1) 



(we do not need to condition the distribution to be normalized, as the marginals are assumed to be 
already normalized). The required distribution can be written as 

P(i,2){^uX2)=Pi{Xi)P2{X2)e-''^^'^-^^^^-^^^^^^^-^ =Pi{Xi)P2iX2)^{Xi)^^{X2)e-^''^'^\ (2) 

The Lagrange multipliers a{Xi), b{X2) (or equivalently, the functions £^{Xi) and 3S{X2)), and the 
constant X\2, are chosen to enforce the restrictions, which result in the set of coupled nonlinear integral 
equations 



1 =i/(Xi) j P2{X2)^{X2)e-^'^^'^^dX2, 
h 

1 = ^(^2) j Pi {Xi)j^{Xi )e-^''-^'^'dXi X2 G h 



(3) 



h 

plus a condition on the value of Xu- 



I X1X2P1 (Xi )P2 iX2)£/{Xi)^{X2)e-^'''''''^dXidX2 = ri2 + (Xi ) {X2) (4) 

where the mean values (Xi) and {X2) are calculated from the corresponding marginals. The above 
equations may be rewritten in the slightly more compact form 

PliXi) = Qf\Xi) J G?(X2)e-^''''^'^^JZ2, Xi G h 

k 

P2{X2) = Q?\X2)j Qf\x,)e-^n^'''MXr X2 G h (5) 



and 



/ ^1X22!"^ (Xi {X2)e-^^S^''''dX,dX2 = Tn (6) 



hxh 
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where Qf\Xi) = Pi(Xi)^(Xi) and ^(Xj) = P2{X2)3S{X2). The superscripts have been added to 
indicate that these quantities are elements of the joint distribution of two variables. 

Though far from opening the way to a full solution, it is worthwhile noting that equations © can 

(2) 

be decoupled by multiplying the first one, say, by e^^^^- ^'^ and integrating over Xi. Then 

A(Ai?F) ^ / P,{X,)e-'--^^Ux, = J (X2) J Qf\x,)e-^l?^^^^^)^'dX,dX2 
h h h 

Q^2\X2)P2{X2 + Y) 



Q?{X2 + Y) 



-dX2, (7) 



where Pi{Y) is the Laplace transform of Pi{X). This is a rather difficult nonlinear integral equation 
that determines Q2 {X2), up to a multiplicative constant, in terms of the marginals and the covariance. 

If the variables are discrete rather than continuous, similar expresions are obtained with summa- 
tions instead of integrals. Either way, the above equations turn out to be extremely hard to solve for 
arbitrary marginals. 

The generalization to more than two variables, while formally equally simple, gives rise to a rather 
interesting situation. To illustrate this, consider the case of three variables: Xi,X2 and X3 with their 
respective marginals A(Xi),/2(X2) and Pt,{Xi,); and covariance matrix elements ri2,r23 and (in 
passing, note that only the off diagonal components of the covariance matrix can be introduced as 
constrains, the diagonal elements are fixed by the marginals). If we follow the procedure outlined 
above for two variables,-maximizing entropy relative to the product of the marginals, constrained 
to the appropriate marginals and correlations-, it is easy to see that the joint probability distribution 
should be of the following form: 



^(1.2,3) (^i,^2,x3) = ef )(xoef (X2)ef (X3).-^''"^'^-^3'^^^-^- ^'^^ (S) 

(3) 

where the functions Q- {X\) are related to the Lagrange multiphers that constrain the marginals: 

Pi{Xi) = Q?{X,) J (X2)ef^(X3)^'-^''2''^'^^-^3''^^*-^''3''^'*^/X2^/X3 (9) 
/2X/3 

and so on. 

Now the question arises as to whether it must also be true that the distribution obtained by inte- 
grating ^(1^2,3) (Xi ,X2,X3) over X3, say, must have the form of the constrained maximum entropy joint 
distribution for two variables discussed above. That is, whether: 

|P(,2,3)(^i,^2,X3)JX3 = Qf\x,)Qi'\X2)e-^-^^^^ J Q^^\x,)e-(^'^^^^^^^^^ 
h h 

= GS'^(Xi)Gf (X2)^'-^i'?^'^^=P(i,2)(Xi,X2). (10) 

If so, this would in turn imply that, independently of what the marginals /^ (X, ) are, the integral that 
appears above can always be resolved as 

/ ^(X3)6.-(^3'^^+^i'3'^')^^JX3 = \Xl)4'^(X2)^.-'^^''^'^^ (11) 
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in terms of which we can express Qf\Xi) = qf\Xi)Qf\Xi), Q^^^^^) = (^2)6?^ (^2) and 

While it turns out that for the simple cases considered in this paper, eq . ([TOb is indeed satisfied, 
I cannot find any reason why it should be true in general. Actually, let me consider another joint 
distribution 11(1 2,3) (Xi,X2,X3) defined to be the maximum entropy distribution conditioned so that 
integrating over any variable yields the corresponding two point maximum entropy distributions dis- 
cussed above: 



= 5 



J \Pl{Xl)P2{X2)P3{Xi) J 

/2X/3 

I (a(i,2)(Xi,X2) + a(2,3)(X2,X3) + a(i,3)(Xi,X3))n{i^2,3)(^b^2,^3y^l^/^2^/^3 



Ii x/2 x/3 



+ 

/i x/2 x/3 

the solution to which can be written as 

n(l,2.3)(^b^2,^3) = ^(l,2)(^l)^2)f(2,3)(^2,^3)^'(l,3)(^l,^3) (12) 

where the functions F(; y)(X,-,Xy), are simply related to the Lagrange multipliers a(,j)(X-,Xj) and sat- 
isfy nonlinear integral equations that enforce the conditions imposed on 11(1 2,3) (^ij^2)^3)5 namely 

P(l,2)(^l,^2) =i^(l,2)(^l,^2) J F(2,3)(X2,X3)F(i,3)(Xi,X3)^/X3 (13) 

h 

and so on. The conditions on the distributions P(, y)(X,-,Xy) ensure that 11(1 2.3) (Xi,X2,X3) has the cor- 
rect 1 -point marginals Pi{Xi) for / = 1 , 2, 3, and covariance r,- j. It should be noted that P( 1,2,3) (^1 1^2,^3 )> 
as expressed in eq. ([D, can be written in the form shown in eq. ([T2l ). However, the conditions on 
11(1 2,3) (Xi,X2,X3) are as restrictive or more than those on P(i 2,3)(Xi,X2,X3), thus, it should not nece- 
sarily be the case that P(i 2,3)(Xi,X2,X3) = 11(1 2,3)(Xi,X2,X3). Conversely, it does not appear to be 
necessarily true that 

P(l,2)(Xi,X2)= |P(i,2,3)(Xi,X2,X3)^/X3 (14) 

etc., so that one could end with the somewhat uncomfortable situation in which the two point marginals 
obtained from the maximum entropy three point distribution may themselves not be maximum entropy 
two point distributions. 

We now turn to simple cases for which the required maximum entropy distributions can be calcu- 
lated explicitly. First, however, for the trivial case of "uncorrelated" variables (i.e. the case in which 
r,,; = 0), the required maximum entropy joint distribution is indeed the product of the marginals. 
The first nontrivial example is the case of two correlated random variables with Gaussian marginal 
distributions, say: 

PdXi) = \f^e-"^^^\ (15) 

V 2k 

P2{X2) = xl^e-f^^y^ (16) 

V 2n 
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and let the correlation parameter be F = (X1X2). Then, to determine the joint distribution we need to 
solve 

00 

yZ,-«^f/2 = e^(xj) I Q,{X2)e-'^'^^-dX2 (17) 

— 00 

, 00 

V ^''^""'^^ = 22(^2) / !2i(^i)e-^^'^^^/Xi, (18) 

— 00 

where the sub- and super-indexes have been dropped for notational lightness. Substituting Qi{Xi) and 
22(^2) by Gaussians, it is easy to see that the required maximum entropy joint distribution is: 



1 /2 

j_ / \ ^-—l—^^[aXf+j}xi+2aprXiX2] 

2k Vl-ai3r2^ 

Perhaps not unexpectedly, the maximum entropy joint distribution for more variables with Gaussian 
marginals will be again a Gaussian distribution with appropriate correlations. At this point it is worth 
mentioning that if we restrict ourselves to the class of continuous functions, then the set of joint 
distributions having prespecified marginals and covariance is convex, and the concavity of the en- 
tropy functional guarantees that the distribution at which it is maximized is unique. However, a more 
difficult problem concerns whether distributions satisfying the requirements exist at all (Note, for ex- 
ample, that for large enough F in eq.([T9]). the argument of the exponential changes sign in which case 
P(i 2) (Xi 5X2) cannot be interpreted as a probability distribution). Unfortunately, the general conditions 
under which the set of distributions having the prescribed marginals and covariance is not empty are 
not easy to establish lfT4l . Finally, as upon integration Gaussians beget Gaussians, for these distribu- 
tions eq. (fT4l ). as well as generalizations to more variables, will always hold. 

Explicit expressions for maximum entropy joint distributions corresponding to non-Gaussian marginal 
distributions appear to be very hard to obtain. However, a perturbation expansion in powers of the pa- 
rameter A can be carried out rather easily. Writing 

ei(Xi)=A(Xi)(l+/f')(X0A+/p'(Xi)A2 + ...) (20) 

e2(X2) =P2(X2)(l+/2^''(X2)A+/f (X2)A2 + ...) (21) 

in equations ^ and grouping powers of A, after some rather messy algebra, one can write that, correct 
to order 

^(L2)(^l,^2)=A(Xl)P2(X2)x 

^-{^X,-{Xt)){X2-{X2)) + ^{{{X^)-{X2f){X^-{X,)y^^^^ 

where the averages, (Xi), (Xj^), etc., are taken over the mai^ginal distributions, assuming the moments 
exist, and A is calculated using equation Q: 

(XiX2)-(Xi)(X2) 



((X2)-(Xi)2)((X|)-(X2)2) 



(22) 

Clearly, writing the joint distribution as an exponential in eq.(l22]) is not really warranted, except by 
the fact that it guarantees both positivity and integrability, and that it turns out to be slightly more 
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compact than might have been expected. Further, it also highlights the fact that the approximation 
for /'(I 2) (^1)^2) has the form required by eq. corresponding to a maximum entropy distribution. 
From this expression, approximate conditional distributions can be derived immediately, as well as 
conditional expectations. Thus, for example, the conditional expectation of X\ given X2, to linear 
order in A , is 

(X1IX2) « {Xi)-X{{Xf)-{X,f){X2-{X2)) + ... 

{{XrX2)-{X,){X2)){X2-{X2)) , 
= ^""'^ {Xi)-{X2)- + ^''^ 

Also, the excess entropy over the product of marginals is found to be 



As= //■„.,(x„;f.)i„(^l||ig)</x,<,x. 

I, x/2 



The three point distribution can be obtained in the same way, but the result is too long and unen- 
lightening to include here. Nevertheless, it should be mentioned that at least to second order in the 
perturbation parameter, equation (fT4l ) still holds (assuming that all the correlation coefficients can be 
considered to be of linear order in the perturbation parameter). 

In summary, the construction of maximum entropy joint probability distributions with the pre- 
scribed marginals and covariance has been discussed. It should be noted that while there are other 
convenient methods for constructing joint probability distributions with the prescribed marginals and 
covariance lfT4l . only when the entropy is maximized can we be sure that no extra, uncontrolled as- 
sumptions have been introduced. 

Extensions to even more variables are straight forward in principle, but the set of coupled equations 
that result from the maximization of entropy is larger and harder to solve. The exception being, as 
mentioned earlier, the case of gaussian marginals with fixed correlations, for which the maximum 
entropy distribution is the appropriate correlated gaussian distribution. Also, the whole discussion can 
be extended to the case in which the inter-relation among the variables is not encoded in the linear 
correlation constant, but rather by another more general moments; for example, for the case of random 
variables Xi andX2 with given marginals, with the constriction {f{X\,X2)) = 0. Another interesting 
extension pertains to approximation schemes for the joint distribution. For example, for the case in 
which the marginals do not have second moment, so the perturbation expansion as presented above, is 
not possible. 

I am grateful to Ana Maria Contreras for her valuable comments on the manuscript, and to F. 
Leyvraz and S. Majumdar for useful discussions. Partial support through grant DGAPA-UNAM 
IN109111 is gratefully acknowledged. 
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